In this tutorial I will show you how to remove all special characters, punctuation except spaces from string in Python.
The following program is to extract data from a URL using beautifulsoup package. If the title tag contain special characters then I want to remove it.
CODE:
import string from docx import Document from bs4 import BeautifulSoup import urllib.request def remove_symbols(title): trans = str.maketrans("", "", string.punctuation) cleaned_title = title.translate(trans) return cleaned_title hdr = {"User-Agent": "My Agent"} request = urllib.request.Request(url = 'https://tensix.com/oracle-bi-publisher-installation-error-inst-05058-a-lookup-of-the-address-for-this-machine/', headers=hdr) f = urllib.request.urlopen(request) myfile = f.read() soup = BeautifulSoup(myfile, 'html.parser') title = soup.title.text.strip() doc = Document() doc.add_heading(title, 1) cleaned_title = remove_symbols(title)print(cleaned_title)
But above code not removing full stops & numbers. I m going to use Regx to remove unwanted Characters.
CODE:
def remove_symbols(title): for k in title.split("\n"): return re.sub(r"[^a-zA-Z0-9]+", ' ', k)
Post your comments / questions
Recent Article
- The request was aborted: Could not create SSL/TLS secure channel -Error in Asp.net
- FieldError: Cannot resolve keyword 'id' into field in Django project
- How to hide the ID field from the Django admin?
- It is impossible to add a non nullable field without specifying a default. Django error
- ImportError: cannot import name 'url' from 'django.conf.urls' - Django Error
- How to Enable Virtualization in BIOS Security Settings in Intel Processors For Android Studio?
- Dependency 'androidx.activity:activity:1.8.0' requires libraries and applications that depend on it.
- AttributeError: 'NoneType' object has no attribute 'get_text' - Python
Related Article